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In this paper we offer a unified approach to the problem of non- 
parametric regression on the unit interval. It is based on a universal, 
honest and non-asymptotic confidence region An which is defined by 
a set of linear inequalities involving the values of the functions at the 
design points. Interest will typically centre on certain simplest func- 
tions in An where simplicity can be denned in terms of shape (num- 
ber of local extremes, intervals of convexity /concavity) or smoothness 
(bounds on derivatives) or a combination of both. Once some form 
of rcgularization has been decided upon the confidence region can be 
used to provide honest non-asymptotic confidence bounds which are 
less informative but conceptually much simpler. 



1. Introduction. Non-parametric regression on the unit interval is con- 
cerned with specifying functions /„ which are reasonable representations of a 
data set y n = {(ti,y(ti)),i = 1, . . . , n}. The design points ij are assumed to 
be ordered. Here and below we use lower case letters to denote generic data 
and upper case letters to denote data generated under a specific stochas- 
tic model. The first approach to the problem used kernel estimators with a 
fixed bandwidth (Watson, 1964) but since then many other procedures have 
been proposed. We mention splines (Green and Silverman, 1994; Wahba, 
1990), wavelets (Donoho and Johnstone, 1994), local polynomial regression 
(Fan and Gijbel, 1996), kernel estimators with local bandwidths (Wand and 
Jones, 1995) very often with Bayesian and non-Bayesian versions. 

The models on which the methods are based are of the form 

(1) Y(t) = f(t) + a(t)e(t), t€[0, 1] 

with various assumptions being made about a(t), the noise e(t) as well as 
the design points {t\, . . . , t n }. We shall restrict attention to the simplest case 

(2) Y(t) = f(t) + aZ(t), t€[0, 1] 
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where Z is Gaussian white noise and the ti are given by ti = i/n. We 
mention that the same ideas can be used for the more general model ([I]) and 
that robust versions are available. The central role in this paper is played 
by a confidence region A n which is defined below. It specifies all functions 
f n for which the model ([2]) is consistent (in a well-defined sense) with the 
data y n . By regularizing within A n we can control both the shape and 
the smoothness of a regression function and provide honest non-asymptotic 
confidence bounds. 

The paper is organized as follows. In Section [2] we define the confidence 
region A n and show that it is honest and non-asymptotic for data generated 
under ([2]). In Section [3] we consider shape regularization, in [5] regularization 
by smoothness as well as the combination of shape and smoothness regu- 
larization. Finally in Section [5] we show how honest and non-asymptotic 
confidence bounds can be obtained both for shape and smoothness regular- 
ization. 

2. The confidence region An- 

2.1. Non-parametric confidence regions. Much attention has been given 
to confidence sets in recent years. These sets are often expressed as a ball 
centred at some suitable estimate (Li, 1989; Hoffmann and Lepski, 2002; Ba- 
raud, 2004; Cai and Low, 2006; Robins and van der Vaart, 2006) with par- 
ticular emphasis on adaptive methods where the radius of the ball automati- 
cally decreases if / is sufficiently smooth. The concept of adaptive confidence 
balls is not without conceptual difficulties as the discussion of Hoffmann 
and Lepski (2002) shows. An alternative to smoothness is the imposition 
of shape constraints such as monotonicity and convexity (Diimbgen, 1998, 
2003; Diimbgen and Spokoiny, 2001; Diimbgen and Johns, 2004; Diimbgen, 
2007). Such confidence sets require only that / satisfy the shape constraint 
which often has some independent justification. 

We consider data Y n = Y n (f) generated under ([2]) and limit attention to 
functions / in some family T n . We call a confidence set C n (Y n (f), a) exact 
if 

(3) P(f eC n (Y n (f),a)) = a for all / G J- n , 
honest (Li, 1989) if 

(4) P(f eC n (Y n (f),a))>a for all / G F n , 
and asymptotically honest if 

(5) liminf inf P(f G C n (Y n (f),a)) > a 
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holds but it is not possible to specify the no for which the coverage proba- 
bility exceeds a — e for all n > no- Finally we call C n (Y n (f), a) universal if 
F n = {f:f: [0,1]-R}. 

2.2. Definition of A n . The confidence region A n we use was first given 
in Davies and Kovac (2001). It is constructed as follows. For any function 
g : [0, 1] — * R and any interval / = [tj, t^] of [0, 1] with j < k we write 

(6) w(y n ,g,I) = ^=J2(y^-9(ti)) 

where |/| denotes the number of points tj in /. With this notation 

(7) A n = A ) = {ff : max|w(y n ,g,I)| < ay/r n logn } 

where X n is a family of intervals of [0, 1] and for given a the value of r n = 
T n (a) is defined by 

(8) P( max — L I ^ Z(ti) < v 7 ^ log n ) = a. 



If the data y n were generated under ([2]) then ([8]) implies that P(f G ^4 n ) = a 
with no restrictions on / so that .A ra is a universal, exact a-confidence region. 
We mention that by using an appropriate norm (Mildenberger, 2006) A n can 
also be expressed as a ball centred at the observations y n . 

A function g belongs to A n if and only if its vector of evaluations at 
the design points (g(ti), . . . ,g(t n )) belongs to the convex polyhedron in W n 
which is defined by the linear inequalities 

' ^2(y{ti) - g(ti)) < o- n ^T n \ogn, iei n - 



The remainder of the paper is in one sense nothing more than exploring the 
consequences of these inequalities for shape and smoothness regularization. 
They enforce both local and global adaptivity to the data and they are 
tight in that they yield optimal rates of convergence for both shape and 
smoothness constraints. 

In the theoretical part of the paper we take I n to be the set of all intervals 
of the form [tj, tj]. For this choice of A n checking whether g € A n for a given 
g involves about n 2 /2 linear inequalities. Surprisingly there exist algorithms 
which allow this to be done with algorithmic complexity 0(n log n) (Bernholt 
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and Hofmeister, 2006). In practice we restrict X n to a multiresolution scheme 
as follows. For some A > 1 we set 

x n = {[ti( j: k), tuu,k)\ ■ 10, k) = l(j - l)A fc + 1J , 

(9) u(j, k) = min{ LjA fc J ,n},j = l,..., \nX~ k ] , 
k = l,..., [log nj log A] } . 

For any A > 1 we see that I n now contains O(n) intervals. For A = 2 
we get the wavelet multiresolution scheme which we use throughout the 
paper when doing the calculations for explicit data sets. If I n is the set 
of all possible intervals it follows from a result of Diimbgen and Spokoiny 
(2001) that linin^oo r n = 2 whatever the value of a. On the other hand for 
any I n which contains all the degenerate intervals [tj, tj] (as will always 
be the case) then lim ra _ >00 T n > 2 whatever a. In the following we simply 
take T n = 3 as our default value. This guarantees a coverage probability 
of at least a = 0.95 for all samples of size n > 500 and it tends rapidly 
to one as the sample size increases. The exact asymptotic distribution of 
^-8t^-i<i<j<n{J2j = i Zi) 2 /ti ~ i + 1) nas recently been derived by Kabluchko 
(2007). 

As it stands the confidence region ([7]) cannot be used as it requires a. We 
use the following default estimate 

(10) a n = median(|y(t 2 ) - y(ti)\,. . . , \y{t n ) - y(t n ^)\)/(^-\0.75)V2) 

where is the inverse of the standard normal distribution function <3?. It 
is seen that a n is a consistent estimate of a for white noise data. For data 
generated under ([2]) a n is positively biased and consequently the coverage 
probability will not decrease. Simulations show that 

(11) P(f € A n (Y n ,l n ,a n ,3)) > 0.95 
for all n > 500 and 

(12) hrn^ inf P(f e An(Y n ,l n ,a n ,3)) = 1. 

In other words A n is a universal, honest and non-asymptotic confidence 
region for /. To separate the problem of specifying the size of the noise 
from the problem of investigating the behaviour of the procedures under the 
model ([2]) we shall always put a n = a for theoretical results. For real data 
and in all simulations however we use the a n of (jlOp . 

The confidence region A n can be interpreted as the inversion of the mul- 
tiscale tests that the mean of the residuals is zero on all intervals / G . 
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A similar idea is to be found in Diimbgen and Spokoiny (2001) who invert 
tests to obtain confidence regions. Their tests derive from kernel estimators 
with different locations and bandwidths where the kernels are chosen to be 
optimal for certain testing problems for given shape hypotheses. The confi- 
dence region may be expressed in terms of linear inequalities involving the 
weighted residuals with the weights determined by the kernels. The confi- 
dence region we use corresponds to the uniform kernel on [0, 1]. Because 
of their multiscale character all these confidence regions allow any lack of 
fit to be localized (Davies and Kovac, 2001; Diimbgen and Spokoiny, 2001) 
and under shape regularization they automatically adapt to a certain degree 
of local smoothness. Universal exact confidence regions based on the signs 
of the residuals sign(y(tj) — g(t{)) rather than the residuals themselves are 
to be found implicitly in Davies (1995) and explicitly in Diimbgen (2003, 
2007) and Diimbgen and Johns (2004). These require only that under the 
model the errors e(t) be independently distributed with median zero. As a 
consequence they do not require an auxiliary estimate of scale such as (|10|) . 
Estimates and confidence bounds based on such confidence regions are less 
sensitive but much more robust. 

3. Shape regularization and local adaptivity. 

3.1. Generalities. In this section we consider shape regularization within 
the confidence region A n . Two simple possibilities are to require that the 
function be monotone or that it be convex. Although much has been writ- 
ten about monotone or convex regression we are not concerned with these 
particular cases. Given any data set y n it is always possible to calculate a 
monotone regression function, for example monotone least squares. In the 
literature the assumption usually made is that the / in (|2|) is monotone 
and then one examines the behaviour of a monotone regression function. 
Although this case is included in the following analysis we are mainly con- 
cerned with determining the minimum number of local extreme points or 
points of inflection required for an adequate approximation. This is STEP 
2 of Mammen (1991) [20]. We shall investigate how pronounced a peak or 
a point of inflection must be before it can be detected on the basis of a 
sample of size n. These estimates are in general conservative but they do 
reflect the real finite sample behaviour of our procedures. We shall also in- 
vestigate rates of convergence between peaks and points of inflection. We 
show that these are local in the strong sense that the rate of convergence at 
a point t depends only on the behaviour of / in a small neighbourhood of t. 
Furthermore we show that in a certain sense shape regularization automat- 
ically adapts to the smoothness of /. All the calculations we perform use 
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only the shape restrictions of the regularization and the linear inequalities 
which determine A n . The mathematics are extremely simple involving no 
more than a Taylor expansion and are of no intrinsic interest. We give one 
such calculation in detail and refer to the appendix for the remainder. 

3.2. Local extreme values. The simplest form of shape regularization is 
to minimize the number of local extreme values subject to membership of 
An- We wish to determine this minimum number and exhibit a function in 
A n which has this number of local extreme values. This is an optimization 
problem and the taut string algorithm of Davies and Kovac (2001) was 
explicitly developed to solve it. A short description of the algorithm used in 
Kovac (2007) is given in the appendix, section 7.3. We analyse the properties 
of any such solution and in particular the ability to detect peaks or points 
of inflection. To do this we consider data generated under the model ([2]) and 
investigate how pronounced a peak of the generating function / of ([2]) must 
be before it is detected on the basis of a sample of size n. We commence with 
the case of one local maximum and assume that it is located at t = 1/2. Let 
I c denote an interval which contains 1/2. For any f n in A n we have 

\= fniti) > Y /(*<) " CT V31og^ + a Z(I C ) 



Uei c 



and hence 



/ 1Q \ f u\ ^ 1 fu \ V31ogn - Z(I C ) 
(13) max fniti) > — 2^ f(k) - a = 



where 



Let // and I r be intervals to the left and right of I c respectively. A similar 
argument gives 

(14) mm fniti) <TTT 2^ /W + a Trff 

and 

(15) mm fn{ti) < — 2^ f(ti) + a = . 
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If now 



1 ^ £f . , y/3]^i+Z{Il) 



> max < — E /(*<) + °- ttt 

rrr E /w + CT Tfpf 

then any function in A n must have a local maximum in Ii U J c U I r . The 
random variables Z(I C ), Z(I\) and Z(I r ) are independently and identically 
distributed -/V(0, 1) random variables. With probability at least 0.99 we have 
Z(Jc) > -2.72, Z(Ii) < 2.72 and Z(I r ) < 2.72 and hence we can replace ([16]) 
by 



It E /(*) 



V3 log n + 2.72 
a- 



j I ^ — ' v " ,/rr 



, l7 x . / 1 ^ ,,,,, y/315^ + 2.72 
I 17 ) > max ^ — 2^ + cr = , 

1 ^ ^3bg^+2J2l 

ttm E /(**) + CT TrrT f 

If we now regularize by considering those functions in A n with the minimum 
number of local extreme values we see that this number must be at least one. 
As / itself has one local extreme value and belongs to A n with probability 
rapidly approaching one we see that with high probability the minimum 
number is one and that this local maximum lies in U I c U I r . 

The condition (|17j) quantifies a lower bound for the power of the peak so 
that it will be detected with probability of at least 0.94 on the basis of a 
sample of size n > 500. The precision of the location is given by the interval 
II U I c U I r . We apply this to the specific function 

(18) f b {t) = b((t - l/2)/0.01) 

where 



(19) b(t) 



\t\ < 1 
otherwise. 
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We denote by f£ n a function in A n which has the smallest number of local 
extreme values. As the function of (|18p lies in A n with probability rapidly 
tending to one and has exactly one local extreme it follows than any such 
must have exactly one local extreme. Suppose we wish to detect the 
local maximum of fb with a precision of 5 = 0.01. As all points in the 
interval [0.49, 0.51] are in a sense the same local maximum we require the 
local maximum of to lie in the interval [0.48, 0.52]. A short calculation 
with (7 = 1 shows that the smallest value of n for which (|17p is satisfied is 
approximately 19500. A small simulation study using the taut string resulted 
in the peak being found with the prescribed accuracy in 99.6% of the 10000 
simulations. 

We now consider a function / which has exactly one local maximum 
situated in t = 1/2 and for which 

(20) -c 2 < f (2) (t) < - Cl <o, tel , 

for some open interval Iq which contains the point t = 1/2. We denote by /* 
a function in A n which minimizes the number of local extremes. For large 
n any such function /* will have exactly one local extreme value which is a 
local maximum situated at t* with 

/ /lognN 1/5N 

(21) |t; _ i/2| = 0/ H 

An explicit upper bound for the constant in Of in terms of c\ and C2 of ([20 
is available. We also have 

(22) f * {t *J> f{1/2 )-O f (C^) 25 ^ 

with again an explicit constant available. In the other direction 



(23) /*(£) < /(1/2) + a(./3l^ + 2.4). 

The proofs are given in the Appendix. 

More generally suppose that / has a continuous second derivative and 
k local extreme values situated at < if < . . . < t e K < 1 with f^ 2 \t%) ^ 
0,k = 1,...,k. If /* € A n now denotes a function which has the smallest 
number of local extreme values of all functions in A n it follows that with 
probability tending to one /* will have k local extreme values located at the 
points < t*1 < . . . < i*^ < 1 with 



(24) |<S-**l = / MM \,k = l,...,K. 
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Furthermore if t\ is the position of a local maximum of / then 

(25) £(O>/(«)-0/((^) 2/5 ) 
whereas if tt is the position of a local minimum of / then 

(26) £(«)</($) + 0,((^) 2/B ). 
In the other direction we have 



(27) /*(C fc ) < /(t|)+^(v / 3To^+ v /31og(8 + K )) 



(28) /*(C fc ) > /(*£) - ^(731^ + ^31og(8 + «) ). 

More precise bounds cannot be attained on the basis of monotonicity argu- 
ments alone. 

3.3. Between the local extremes. We investigate the behaviour of f£ be- 
tween the local extremes where /* is monotone. For any function g : [0, 1] — ► 
R we define 

(29) \\g\\ It0O = sup{|$(t)| : t € I}. 

Consider a point t = i/n between two local extreme values of / and write 
F nk = [i/n, (i + k)/n] with fc> 0. Then 



(30) rjiln) - /(i/n) < is min„ + 2,^^ | 

where £;* r denotes the largest value of k for which /* is non-decreasing on 
I^ k . It follows from f|30|) and the corresponding inequality on the left that 
as long as /* has the correct global monotonicity behaviour its behaviour 
at a point t with f^'(t) 7^ depends only on the behaviour of / in a small 
neighbourhood of t. In particular we have asymptotically 

(31) i/w-/:(t)i<3 4 /v/ 3 i/W(t)|V3^y /3 . 

Furthermore if f^(t) = on a non-degenerate interval I = [ti,t r ] between 
two local extremes then for t\ < t < t r we have I? = [ti,t] and /* = [t,t r ] 
which results in 

(32) \f(t) - f*(t)\ < , r r — ( ^ ) . 

minjv* - fy, V*r - t\ \ n J 
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The same argument shows that if 

\f(t)-f(s)\<L\t-sf 

with < P < 1 then 

(33) \f(t) - f n (t)\ < C L 1 /(2/3+l) (a//S) 2/3/(2/9 + l) (logn/n) ^/(2^+l) 

where 

/ i \ 1/(2/3+1) 

c < (2/3 + l^ 2 ^ 1 ) f — ^ J < 4.327. 

Apart from the value of c this corresponds to Theorem 2.2 of Diimbgen and 
Spokoiny (2001). 

3.4. Convexity and concavity. We now turn to shape regularization by 
concavity and convexity. We take an / which is differentiable with derivative 
fW which is strictly increasing on [0, 1/2] and strictly decreasing on [1/2, 1]. 
We put l c nk = [1/2 - k/n, 1/2 + k/n], P nk = [t, - k/n,U + k/n] with i, + 
k/n < 1/2 — k/n and I l nk = [t r — k/n, t r + k/n] with t r — k/n > 1/2 + k/n. 
Corresponding to (fT7|) if / satisfies 



min f {1) {t)/n- (2fj( v / 31og n + 2.72/^)) /k 3/2 

> max J max/ (1) (t)/n + (2a{y / 3logn + 2.72/V2)) /k 3/2 , 

(34) max f (l \t)/n + (2a(v^loi^ + 2.72/>/2)) /fe 3/2 . 

then it follows that with probability tending to at least 0.99 the first deriva- 
tive of every differentiable function f n £ A n has at least one local maximum. 
Let /* be a differentiable function in A n whose first derivative has the small- 
est number of local extreme values. Then as / belongs to A n with probabil- 
ity tending to one it follows that f n has exactly one local maximum with 
probability tending to at least 0.99. Suppose now that / has a continuous 
third derivative and k points of inflection located at < t\ < . . . < t\ with 

f^(t})=0 and .f :,: ;0i / 0.., 1 k. 

If /* has the smallest number of points of inflection in A n then if / € A n 
with probability tending to one it follows that with probability tending to 
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one /* will have k points of inflection located at < i*\ < . . . < i* ? K < 1. 
Furthermore corresponding to (|24p we have 

(35) \t* n \ -41 = 0/^ (^) 1/7 ^j ,k = l,...,K. 

Similarly if t\. is a local maximum of then corresponding to ()25f) we have 

(36) /^(a)>/ (1) (*i)-o/((^) 2/7 ) 

and if it is a local minimum of then corresponding to (|26p we have 

(37) ^l^^ + O,^)^. 

3.5. Between points of inflection. Finally we consider the behaviour of 
/* between the points of inflection where it is then either concave or convex. 
We consider a point t = i/n and suppose that /* is convex on l r nk = [i/n, (i+ 
2k) /n]. Corresponding to ([30]) we have 



(38) f«%/n) - PHi/n) < ^ j ~||/ (2) lk fc ,oo + 4an^ 



31ogn 



where /c* r is the largest value of k such that /* is convex on [i/n, (i + 2k)/n\ 
Similarly corresponding to (|77p we have 



(39) /W(i/n) - f: W (i/n) < mm \-\\f^\\ Il+ Aan J 

l<k<k* 1 TO nk' y 



3 log n 



where I l nk = [i/n — 2k /re, i/n] and k*^ is the largest value of k for which /* 
is convex on I l nk . If f^ 2 \t) / Owe have corresponding to (f3T|) 

(40) |/ n *«(i) - /W(t)| < 4.36a 2 / 5 |/ (2) W! 3/5 ( — ^ 

V n 

as re tends to infinity. If f^ 2 \t) = on the non-degenerate interval I = [ti,t r ] 
then for t[ < t < t r we have corresponding to ([32]) 



(4i) -/") W |< ffi ,wn (— )' /2 

mm{(t — t;) 01 ' , (i r — £) d '^ } \ n J 
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The results for /* itself are as follows. For a point t with f^ 2 \t) ^ and an 
interval l r nk = [t, t + 2k /n] where /* is convex we have 



where a{f,t) = 4.36a 2 / 5 \f^(t)\ 3 / 5 . If we minimize over k and repeat the 
argument for a left interval we have corresponding to (|3ip 

(42) \f*(t) - f(t)\ < 11.58a 4 / 5 |/ (2) W| 1/5 (^) 2/5 • 

Finally if / (2) (*) = for t in the non-degenerate interval [ti,t r ] we have 
corresponding to (f32l) for t[ < t < t r 



(43) mt)-m\< ..J^f^ i^f. 

mm{\/t - tl, yjt r - t ) V w / 

If the derivative of /satisfies < L\t-a\ p with < /3 < 1 

then corresponding to (f33j) we have 

lognX/3/(2/3+3) 

n / 



|/*(U( t ) - /(D(t)| < cL 3 /^ 3 )^//?) 2 ' 3 /^ 3 ) 
with 



/a /o\ (/3+2)/(2/3+3) / « v 3/(2/3+3) 

There is of course a corresponding result for /* itself. 

4. Regular izat ion by smoothness. We turn to regularization by smooth- 
ness. 

4.1. Minimizing total variation. We define the total variation of the fcth 
derivative of a function g evaluated at the design point t $ = z/n by 



(44) TF(/):= £ |A( fc+1 )( 5 (i/n)) 

i=fc+2 



fc > 



where 

(45) A^ k+1 \g{i/n)) = A^(A^(g(i/n))) 
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with 

AW(s(i/n)) = n(g(i/n) - g((i - l)/n)). 
Similarly the supremum norm Hg^Hoo is defined by 

(46) lb (fe) l|oo = max \^ k \g(i/n))\. 

i 

Minimizing either TV(g k ) or Hg^Hoo subject to g £ A n leads to a linear 
programming problem. Minimizing the more traditional measure of smooth- 
ness 

Cg^itfdt 
Jo 

subject to g £ A n leads to a quadratic programming problem which is 
numerically much less stable (cf. Davies and Meise, 2005) so we restrict 
attention to minimizing TV(g k ) or ||</"||oo- 

Minimizing the total variation of g itself, k = 0, leads to piecewise con- 
stant solutions which are very similar to the taut string solution. In most 
cases the solution also minimizes the number of local extreme values but this 
is not always the case. The upper panel of Figure [T] shows the result of mini- 
mizing TV(g) for the Doppler data of Donoho and Johnstone (1994). It has 
the same number of peaks as the taut string reconstruction. The lower panel 
of Figure Q] shows the result of minimizing TV(g^). The solution is a linear 
spline. Figure [T] and the following figures were obtained using the software 
of Kovac (2007). Just as minimizing TV(g) can be used for determining the 
intervals of monotonicity so we can use the solution of minimizing TV(g^) 
to determine the intervals of concavity and convexity. Minimizing TV(g^) 
or Hs^lloo f° r larger values of k leads to very smooth functions but the 
numerical problems increase. 

4.2. Smoothness and shape regularization. Regularization by smoothness 
alone may lead to solutions which do not fulfill obvious shape constraints. 
Figure [2] shows the effect of minimizing the total variation of the second 
derivative without further constraints and the minimization with the impo- 
sition of the taut string shape constraints. 

4.3. Rates of convergence. Let f n be such that 

(47) ||/,l 2) ||oo<||5 {2) ||oo VgeAn. 

For data generated under with / satisfying ||/^ 2 ^||co < oo it follows that 
with probability rapidly tending to one 

(48) ||/( 2) ||oo<||/ (2) ||oo. 
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Fig 1. Minimization ofTV(g) (upper panel) andTV(g^) (lower panel) subject to g 6 A n 
for a noisy Doppler function. 

A Taylor expansion and a repetition of arguments already used leads to 

(49) \f n (i/n) - f(i/n)\ < 3.742\\f^\\^ {^)" 
on an interval 

0.58a 2 / 5 (logn) 1 /V(||/P)||V5 n i/5 )) 1 _ o.58a 2 / 8 (Iogn) 1 /V(||/W||^ n V 8 ); 

with a probability rapidly tending to one. A rate of convergence for the first 
derivative may be derived in a similar manner and results in 

(50) \J n {i/n) - /W(</n)| < 4.251||/( 2 ) \\% 5 a^ (^f^'" 
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Fig 2. The minimization of the total variation of the second derivative with (solid line) 
and without (dashed line) the shape constraints derived from the taut string. The solution 
subject to the shape constraints was also forced to assume the same value at the local 
maximum as the taut string solution. 

on an interval 

2.15a 2 / 5 (logn) 1 /V(||/(2)||^ n i/5 )) , _ 2 . 1B ^/6 (Iogn) i/B /( || / ( 2 )||2/B n i/B ) - . 
5. Confidence bands. 

5.1. The problem. Confidence bounds can be constructed from the con- 
fidence region A n as follows. For each point ti we require a lower bound 
lb n (y n ,ti) = lb n (ti) and an upper bound ub n (y n ,ti) = ub n (ti) such that 

(51) B n {y n ) = {g : lb n {y n , U) < gfc) < ub n {y n ,U),i = 1, . . . n} 
is an honest non-asymptotic confidence region 

(52) P(f G B n (Y n {f))) > a for all / G T n 

for data Y n (f) generated under ([2]). In a sense the problem has a simple 
solution. If we put 

(53) lb n {ti) = y{ti) - cr n v^logre, vb n {U) = y(U) + a n s/3 log n , 

then A n C B n and §2$ for all holds with T n = {/ 1 / : [0, 1] -► oo}. Such 
universal bounds are too wide to be of any practical use and are consequently 
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not acceptable. They can only be made tighter by restricting T n by impos- 
ing shape or quantitative smoothness constraints. A qualitative smoothness 
assumption such as 

(54) ^ = {/:||/ (2) IU<oo} 

does not lead to any improvement of the bounds (j53|) . They can only be 
improved by replacing (p3j) by a quantitative assumption such as 



(55) ^n = {/:||/ (2) ||oo<60}. 
5.2. Shape regularization. 

5.2.1. Monotonicity. As an example of a shape restriction we consider 
bounds for non-decreasing approximations. If we denote the set of non- 
increasing functions on [0, 1] by 

A4 + = {g : g : [0, 1] — ► R, g non-decreasing} 

then there exists a non-decreasing approximation if and only if 

(56) M + n A n + 0. 

This is the case when the set of linear inequalities which define A n together 
with g(ti) < . . . < g(t n ) are consistent. This is once again a linear pro- 
gramming problem. If (|56p holds then the lower and upper bounds are given 
respectively by 

(57) lb n (ti) = mm{g(ti) : g £ M + D A n }, 

(58) ub n (ti) = max {g(ti) : g e M + f) An}- 

The calculation of lb n {ti) and ub n (ti) requires solving a linear programming 
problem and although this can be done it is practically impossible for larger 
sample sizes using standard software because of exorbitantly long calculation 
times. If the family of intervals Z n is restricted to a wavelet multiresolution 
scheme then samples of size n = 1000 can be handled. Fast honest bounds 
can be attained as follows. If g £ M + nA n then for any i and k with i+k < n 
it follows that 

1 k 

Vk + 1 g(U) > V Y n (U-j) ~ ^logra. 
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From this we may deduce the lower bound 
(59) lb n (ti) = max 

0<k<i- 

with the corresponding upper bound 



' 3 log n 



o<*<i-ilJfc + i ^ J V k + l 



( 1 /31ogra 
(60) ub n (U) = ^ ^_gy n(ti+i ) +^ TTr 

Both these bounds are of algorithmic complexity 0(n 2 ). Faster bounds can 
be obtained by putting 



(61) lb n (ti) = max „ /n , .. V *n(*i-j) ~ ^ 



o<<?(fc)<i-i \ 0(Jfe) + 1 ^ J V ^) + 1 



(62) ub n {ti) = min ,,,,>. , -, y,Y n (t i+j ) + a 




3 log n 



3 log n 



o<e(fc)<n-i \ e(ife) + 1 ^ ' V + 1 



where 9{k) = \9 k — lj for some 9 > 1. These latter bounds are of algorithmic 
complexity O(nlogn). The fast bounds are not necessarily non-decreasing 
but can be made so by putting 

ub n (ti) = min (ub n (ti),ub n (t i+1 )), i = n - 1, . . . , 1, 
lb n (ti) = max (lb n (ti),lb n {ti-\)), i = 2,...,n. 

The upper panel of Figure [3] shows data generated by 

(63) Y(t) = exp(5t) + 5Z(t) 

evaluated on the grid ij = z/1000, a = 1,... ,100 together with the three 
lower and three upper bounds with a replaced by a n of (fTOl) . The lower 
bounds are those given by (|57p with X n a dyadic multiresolution scheme, 
(I59p and (|61h with 8 = 2. The times required for were about 12 hours, 19 
seconds and less than one second respectively with corresponding times for 
the upper bounds ([58]) . ([60]) and (f62|) . The differences between the bounds 
are not very large and it is not the case that one set of bounds dominates 
the others. The methods of Section [3] can be applied to show that all the 
uniform bounds are optimal in terms of rates of convergence. 
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Fig 3. The function f(t) = exp(5t) degraded with N(0, 25) noise together with monotone 
confidence bounds (upper panel) and convex confidence bounds (lower panel). The three 
lower bounds in the upper panel are derived from J57] ), A59[) and H61[) and the corresponding 
upper bounds are H58)) . (60\) and S62\) . The lower bounds for the lower panel are \61$ , (68\) 
and | [70| ) and the corresponding upper bounds f65|) , \60(l and \69(l 

5.2.2. Convexity. Convexity and concavity can be treated similarly. If 
we denote the set of convex functions on [0, 1] by C + then there exists a 
convex approximation if and only if 

C+nAn^ 0. 

Assuming that the design points are of the form ti = i/n this will be the 
case if and only if the set of linear constraints 

g{t i+ i) - g(U) > g(U) - g(U-l), i = 2, . . . , n - 1, 

are consistent with the linear constraints which define A n - Again this is a 
linear programming problem. If this is the case then lower and upper bounds 
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are given respectively by 

(64) lb n {ti) = min {g(U) : g E C+ n An }, 

(65) ub n (ti) = max{g(U) : g £ C + nA n } 

which again is a linear programming problem which can only be solved 
for relatively small values of n. An honest but faster upper bound can be 
obtained by noting that 

1 k 

g(i/n) < 2k + 1 9{(i+j)/n), k < mm(i -l,n-i) 

j=-k 

which gives rise to 

/ 3 log n 



( 1 k 

(66) ub n {ti)= min — V Y n (t i+j ) +a 

0<fc<mia(t-l,n-t) I 2k + 1 f^ fc J y 2k + 1 

A fast lower bound is somewhat more complicated. Consider a function 
f n G C + Pi »4 n and two points (i/n, f n {i/ n )) and ((i + k)/n, ub n ((i + k)/n)). 
As + k)/n) < ub n ({i + k)/n) and / n is convex it follows that / n lies 
below the line joining (i/n, f n (i/n)) and ((i + k)/n,ub n ((i + k)/n)). From 
this and / n G -4 n we may derive a lower bound by noting 

(67) lb n {ti) < lb n (ti,k) := 



BW, ~ ^L, Y n(ti+j) - ub n(ti+k)U + l)/(2fe) - crj3logn/j 

i<j<k yj l=1 

for all A;, — i + 1 < k < n — i. An honest lower bound is therefore given by 

(68) IK(U) = max lb n (U,k). 

—i+l<k<n—i 

The algorithmic complexity of ub n as given by is 0(n 2 ) whilst that of 
the lower bound (|68p is 0(n 3 ). Corresponding to (|62p we have 

(69) 

7 / \ / 1 ir / \ / 3 log n 

^ (t " )= o< 9W <ZVi, n ^) (^TT^) n(tl+j) + a V^)TT 

and to (ED 



(70) lb n {ti)= max lb n (t h 9{k)). 

-i+l<$(k)<n-i 
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where 

(71) lb n {U)<lb n {U,9{k)):= 
( 1 Hj) 

i wr^^i+j) - nb n {t i+e[k) )(e(j) + i)/(29(k)) - 

ay/3\ogn/0(j) ) 

with = [# fc J for some 9 > 1. The algorithmic complexity of (|69p is 
O(nlogn) and that of ((ZDJ) is 0(ra(logn) 2 ). 

The lower panel of Figure [3] shows the same data as in the upper panel but 
with the lower bounds given by (j64j) withX n a dyadic multiresolution scheme, 
(|68p and ([TO]) and the corresponding upper bounds (|65|) . (f6o| and ([69j) . The 
calculation of each of the bounds (164p and (I65h took about 12 hours. The 
lower bound (|68p took about 210 minutes whilst (|70p was calculated in less 
than 5 seconds. The lower bound (I64p is somewhat better than (I68p and (I70|) 
but the latter two are almost indistinguishable. 

5.2.3. Piecewise monotonicity. We now turn to the case of functions 
which are piecewise monotone. The possible positions of the local extremes 
can in theory be determined by solving the appropriate linear programming 
problems. The taut string methodology is however extremely good and very 
fast so we can use this solution to identify possible positions of the local 
extremes. The confidence bounds depend on the exact location of the lo- 
cal extreme. If we take the interval of constancy of the taut string solution 
which includes the local maximum we may calculate confidence bounds for 
any function which has its local maximum in this interval. The result is 
shown in the top panel of Figure H] where we used the fast bounds (|6ip and 
(I62p(|6ip and (I62p with 6 = 1.5. Finally if we use the mid-point of the taut 
string interval as a default choice for the position of a local extreme we ob- 
tain confidence bounds as shown in the lower panel of Figure HI The user can 
of course specify these positions and the programme will indicate if they are 
consistent with the linear constraints which define the approximation region 

5.2.4. Piecewise concave-convex. We can repeat the idea for functions 
which are piecewise concave-convex. There are fast methods for determin- 
ing the intervals of convexity and concavity based on the algorithm devised 
by Groeneboom (1996) but in this section we use the intervals obtained 
by minimizing the total variation of the first derivative (Kovac, 2007). The 
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Fig 4. Confidence bounds without (upper panel) and with (lower panel) the specification 
of the precise positions of the local extreme values. The positions in the lower panel are 
the default choices obtained from the taut string reconstruction (Kovac 2007). The bounds 
are the fast bounds 161\) and 162)) with 6 = 1.5. 



upper panel of Figure [5] shows the result for convexity /concavity which cor- 
responds to Figure HI Finally the lower panel of Figure [5] shows the result of 
imposing both monotonicity and convexity /concavity constraints. In both 
cases the bounds used are the fast bounds (f69l) and (JTOj) with 6 = 1.5. 

5.2.5. Sign-based confidence bounds. As mentioned in Section [2.21 work 
has been done on confidence regions based on the signs of the residuals. 
These can also be used to calculate confidence bands for shape-restricted 
functions. We refer to Davies (1995), Dumbgen (2003, 2007) and Diimbgen 
and Johns (2004). 

5.3. Smoothness regularization. We turn to the problem of constructing 
lower and upper confidence bounds under some restriction on smoothness. 
For simplicity we take the supremum norm ||<7^||oc to be the measure of 
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Fig 5. Confidence bounds with default choices for the intervals of convexity /concavity 
(upper panel based on H69[l and \70\ ) with 8 — 1.5) and combined confidence bounds for 
default choices of intervals of monotonicity and convexity/concavity. 

smoothness for a function g. The discussion in Section f5. II shows that honest 
bounds are attainable only if we restrict / to a set J- n = {g : \\g^ ||oo < K} 
with a specified K. We illustrate the idea using data generated by ([2]) with 
f(t) = sin(47ri) and a = 1. The minimum value of ||g( 2 )||oo is 117.7 which 
compares with 16-7T 2 = 157.9 for / itself. The upper panel of Figure [6] shows 
the data together with the resulting function /*. The bounds under the 

~(2) 

restriction ||/A ||oo < 117.2 coincide with the function /* itself. The middle 
panel of Figure [6] show the bounds based on \\g^ ||oo ^ K for 



K = 137.8(= (117.7 + 157.9)/2), 157.9 and 315.8(= 2 x 157.9). 
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Just as before fast bounds are also available. We have for the lower bound 
for given K 

(72) m , n) < y_± Y{{i+j y n) + g) 2 K + c^^ 

and for the upper bound 
(73) 

„ 6(i/n) > mf[ y_± Y{(i+j)M _ ^ v . 

As it stands the calculation of these bounds is of algorithmic complexity 
0(n 2 ) but this can be reduced to 0(n log n) by restricting k to be of the form 
9 m . The method also gives a lower bound for ||</ 2 )||oo f° r 9 to be consistent 
with the data. This is the smallest value of K for which the lower bound 
lb lies beneath the upper bound ub. If we do this for the data of Figure 
with 9 = 1.5 then the smallest value is 104.5 as against the correct bound of 
115.0. The lower panel of Figure [6] shows the fast bounds for the same data 
and values of K. 
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7. Appendix. 

7.1. Proofs of Section\3_ 



7.1.1. Proof of Let A; be such that I c = [1/2-fe/n, 1/2+fc/n] C J . 

A Taylor expansion together with (|20p implies after some manipulation 

1 ^ ... , V3bg^ + 2.72 



> /(1/2) 



/c 2 V31ogn + 2.72 
2^ C2 " ° 



2k 



and on minimizing the right hand side of the inequality with respect to k 
we obtain 



TFT E " CT 

1 c| tie/c 



V3 log n + 2.72 



4/5 



31ogn+2.72J jn 



2/5 



(74) > /(l/2)-l.lcfV 4 / 5 / 

This inequality holds as long as I c = [1/2 — k n /n, 1/2 + k n /n] C /o with 

2/5 



(75) 



0.66c^ 2/ V/ 5 n 4 / 5 (V31oi^ + 2.72) 

If we put // = [1/2 — (ji + l)k n /n, 1/2 — r]k n /n] similar calculations give 
1 

2/c~ 

ueu »-..,_ 

A; 2 



, ^ V3log^ + 2.72 



< /(1/2) - ^« + <, 



V31ogn + 2.72 



2A: 



and hence 

1 ^ V3log^ + 2.72 

ra£ /( * )+ *^Kr- 



> /(1/2) 



C2 /5 <r 4/5 (\/31ogn +2.72) 



x 4/5 



n 



2/5 



0.2178t? 2 ci/c 2 - 1.23 



with the same estimate for I r = [1/2 + f]k n /n, 1/2 + (77 + l)k n /n]. If we put 
77 = 3.4y / C2/ci and 

(76) /„ := [1/2 - (77 + l)fc„/n, 1/2 + (77 + l)fc„/n] C / 

then all estimates hold. Because of (|75p this will be the case for n sufficiently 
large. This implies that (|17p holds for sufficiently large n and in consequence 
any function f n G *4 n has a local maximum in I n . 
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7.1.2. Proofs of HM) and From (p} and (7JD we have 



f*(t* n ) > /(1/2) - l.l4 /5 a 4 / 5 (v^loi^ + 2.72) 4/5 /n 



2/5 



which is the required estimate (|22p . To prove (|23p we simply note 



m* n ) < f(t* n ) + <rZ(t*) + a^3]og^ < /(1/2) + a(731og^ + 2.4). 

7.1.3. Proof o/ (ED and (EIP. As /* G A by definition and / G Ai with 
probability tending to one we have for the interval I r nk = [i/n, (i + k — l)/n] 



-7= £ /n((i + 3) In) <^=E /((* + J')A0 + 2ay3loi"n 



from which it follows that 



/:(Vn)</(i/n) + ^||/( 1 )|| /;fc , 0O + 2c7y f 



31ogn 



which proves l[30|) . Similarly for the intervals J^ fc = [(i — k + l)/n, i/n] we 
have 



(77) f(i/n) - f* n {i/n) < min <j tyi^ +2 J^I!l 

i<k<k* 1 n V k 



■ 



We note that ([30]) and ([77]) imply that /* adapts automatically to / to give 
optimal rates of convergence. If /(!)(£) ^ then it may be checked that the 
lengths of the optimal intervals and A. tend to zero and consequently 

ll/ (1) IK,oo-|/ (1) W|-H/ (1) lk^°o- 

The optimal choice of k is then 

, _ / 3a 2 n 2 log n \ 1/3 _ 

n ~ I i/ (1) wi 2 y 71 

which gives 

AUnfeJ- |/(l)( t )|2/3 { n J ~ X ^nk) 
from which (1311) follows. 
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7.2. Proofs of SectionUQ 



7.2.1. Proof of Then adapting the arguments used above we have 

for any differentiable function /„ 6 A n 



1 k 



-= (/n(l/2 + i/n) - /n(l/2 - fc/n + i/n)) 
v fc i=l 



1 fc 

> — ^(/(l/2 + i/n) - f (1/2 -k/n + i/n)) 
-2a(Vsh^ + Z(I c nk )/V2) 

which implies 

(78) max/^^/n > min /«(i)/n - (2^(^31^ + A 3 / 2 . 

nfe nfe 

Similarly if i^, = [tj — /c/n, ^ + fc/n] with ti + k/n < 1/2 — k/n we have 

(79) min ^ (1) (t)/n < max /«(i)/n + (2a( v / 3bg^ + Z(i^)/v^)) A 3/2 

and for I l nk = [t r — k/n,t r + k/n) with t r — k/n > 1/2 + k/n we have 

(80) mm /*«(t)/n < max /«(i)/n + (2(7(^^1^ + Z(F nk )/V2))/k 3 / 2 . 

rife nfe 

Again following the arguments given above we may deduce from (I78p . (179D 
and ([80]) . that for sufficiently large n it is possible to choose I l nk , I^ k and l r nk 



so that (J34D holds. 

7.2.2. Proof of (33$. We have 



1 fc 



^T,{ti(k/n + i/n)-fM/n)) 

k 

^ - 7 =J2(f(k/n + i/n)-f(i/n))+2aV^ 
Vk j=1 



*(T) 

and f n is non-decreasing on I T nk we deduce 

1.3/2 i k 

< -= J] (/(fc/n + i/n) - /(i/n)) + 2av^bg^. 

n Vk j=l 
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A Taylor expansion for / yields 



</ (1) (t) + ?ll/ (2) ll 



nk 



,00 



+ 2cm 




31ogn 



from which (|38p follows. 

7.3. The taut string algorithm of Kovac (2007). We suppose that data 
yi,...,y n at time points t\ < t% < • ■ ■ < t n are given and first describe 
how to calculate the taut string approximation given some tube widths 
Ao, Ai, . . . , A n . Subsequently we describe how to determine these tube widths 
using a multiresolution criterion. Lower and upper bounds of a tube on [0, n] 
are constructed by linear interpolation of the points (i, Y{ — = 0, . . . n 
and (i,Yi + = 0, . . . , n respectively where Yq = and Y& = Y^-i + yt 
for k = 1, ... ,n. We consider a string F n forced to lie in this tube which 
passes through the points (0,0) and (l,Y n ) and is pulled tight. An explicit 
algorithm for doing this with computational complexity O(n) is described 
in the Appendix of Davies and Kovac (2001). The taut string F n is linear 
on each interval [i — 1, i] and its derivative /j = F n (i) — F n (i — 1) is used as 
an approximation for the data at ij. 

Our initial tube widths are Ao = A n = and Ai = A2 = • • • = A n = 
max(lo) ■ • • > Y n ) — min(lo) • ■ ■ , Y n ). We consider the dyadic index set family 



which consists of at most n subsets of {1, . . . ,n}. Given some taut string 
approximation f%,...,f n using tube widths Aq , • . . , A n we check whether 



iei 

is satisfied for each I £ 2. If this is not the case we generate new tube widths 
Ao, Ai, . . . , A n by setting Ao = A n = and for i = 1, . . . , n — 1 

~ J Aj, if (fHT|) is satisfied for all / G Z with i G / or i + II 
[Aj/2, otherwise. 



Then we calculate the taut string approximation corresponding to these new 
tube widths, check (|8ip . possibly determine yet another set of tube widths 
and repeat this process until eventually (18ip is satisfied for the all I El. 



1= |J {{2^ + i,...,2^ + i)}n{i,..., 



n}}\0 



(81) 
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